Credit Card Users Churn Prediction for Thera bank

Problem Statement:

The Thera bank recently saw a steep decline in the number of users of their credit card. Customers leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas.

Objective:

Provide insight that will help the bank improve its services, so that customers do not renounce their credit cards.

Data Description:

Libraries

Read Dataset

Data Info/Details

Initial EDA

Univariate

Bivariate

Data pre-processing

Feature Engineering

Encoding Target Variable

Spliting Data

Missing-Value Treatment

Encoding Categorical Variables

Building Models

Model evaluation criterion:

Model Prediction Errors

  1. Predicting someone as an attrited customer, but isn't (FP) (Attrited customer, but is actually existing customer)
  2. Predicting someone that is not an attrited customer, but is. (FN) (Existing customer, but is actually attrited customer)

Which is more important?

We want to create a model that accurately predicts the attrited customers, so we want to lower false-negatives.

Which metric to optimize?

Since we want to lower false-negatives, we will want to emphasize the recall score.

Oversampling train data using SMOTE

Undersampling train data using Random Under Sampler

Functions for scoring and matrix

Bagging

Bagging on over-sampled train data

Bagging on under-sampled train data

Random Forest

Random Forest on over-sampled train data

Random Forest on under-sampled train data

Gradient Boost

Gradient Boost on over-sampled train data

Gradient Boost on under-sampled train data

AdaBoost

AdaBoost on over-sampled train data

AdaBoost on under-sampled train data

XGBoost

XGBoost on over-sampled train data

XGBoost on under-sampled train data

Decision Tree

Decision Tree on over-sampled train data

Decision Tree on under-sampled train data

I like Gradient Boost Under, AdaBoost Under, and Bagging Under as the recall scores are high and are comparable between train and validation sets.

Tuning GBoost Under, AdaBoost Under, and Bagging Under with RandomizedSearchCV

Tuned Bagging Under

Tuned Gradient Boosting Under

Tuned AdaBoost Under

The validation scores for Gradient Boost and AdaBoost have improved sligtly, but the tuned models are now overfitting the train data.

Performance on the test set

I will productionize the Gradient Boost model trained on the under-sampled data.

Pipelines for productionizing the model

Actionable Insights & Recommendations

Once the total transaction count reaches ~75 within 12 months, the chance of the customer leaving the bank decreases.

- Maybe introduce a system that incentivizes customers to make more transactions.

Customers whose total transaction amount within a year, reaching ~4,000USD, are likely to still be customers with the bank.

The higher the customer's balance that carries over from month to month, the likelyhood of the customer staying with the bank increases.